-
-
Notifications
You must be signed in to change notification settings - Fork 2
Improve copy 16 #264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve copy 16 #264
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests.
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 4 files with indirect coverage changes 🚀 New features to boost your workflow:
|
this change removes one (1) instruction, but gives a ~2% performance improvement on the benchmarks.
1dce4fc to
fa27fe2
Compare
|
Do you have a diff of the functions that get optimized better with this? |
|
This change is most relevant for |
|
You can replace the body of |
|
I don't see any difference when I do that. Edit: Never mind. There were two |
|
Strangely, even though we now emit the same assembly, and the function is called the same number of times according to cachegrind, we still spend much more time on it than the C version. Alignment doesn't seem to really change anything, and I don't really see how it could be the cache. |
The copy operation will ultimately use the same instructions
however, presenting this as a 128-bit load and store to LLVM from the start optimizes better than the
memmovethatptr::copygenerates.There isn't that much to review really, but maybe you have useful thoughts/notes?